Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy
نویسندگان
چکیده
Variable selection methods play an important role in the field of attribute mining. The Naive Bayes (NB) classifier is a very simple and popular classification method that yields good results in a short processing time. Hence, it is a very appropriate classifier for very large datasets. The method has a high dependence on the relationships between the variables. The Info-Gain (IG) measure, which is based on general entropy, can be used as a quick variable selection method. This measure ranks the importance of the attribute variables on a variable under study via the information obtained from a dataset. The main drawback is that it is always non-negative and it requires setting the information threshold to select the set of most important variables for each dataset. We introduce here a new quick variable selection method that generalizes the method based on the Info-Gain measure. It uses imprecise probabilities and the maximum entropy measure to select the most informative variables without setting a threshold. This new variable selection method, combined with the Naive Bayes classifier, improves the original method and provides a valuable tool for handling datasets with a very large number of features and a huge amount of data, where more complex methods are not computationally feasible.
منابع مشابه
A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملAccuracy comparison between gene selection methods using NAIVE Bayes classifier for the microarray data of JEV infected Mus Musculus brain cells
Japanese Encephalitis is the most important cause of epidemic encephalitis worldwide. From the reports by various sources, about 68,000 cases of Japanese encephalitis (JE) are estimated to occur each year [1]. A vaccine is available for Japanese encephalitis, which utilizes effectively killed inoculated bacteria, but it is expensive and requires a primary vaccination followed by two successive ...
متن کاملIncremental Weighted Naive Bays Classifiers for Data Stream
A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes’ theorem with naive independence assumption. The explanatory variables (Xi) are assumed to be independent from the target variable (Y ). Despite this strong assumption this classifier has proved to be very effective on many real applications and is often used on data stream for supervised classification. The n...
متن کاملA Survey Paper On Naive Bayes Classifier For Multi-Feature Based Text Mining
Text mining is variance of a field called data mining. To make unstructured data workable by the computer Text mining is used which is also referred as “Text Analytics”. Text categorization, also called as topic spotting is the task of automatically classifies a set of documents into groups from a predefined set. Text classification is an essential application and research topic because of incr...
متن کاملUsing Maximum Entropy For Sentence Extraction
A maximum entropy classi er can be used to extract sentences from documents. Experiments using technical documents show that such a classi er tends to treat features in a categorical manner. This results in performance that is worse than when extracting sentences using a naive Bayes classi er. Addition of an optimised prior to the maximum entropy classi er improves performance over and above th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Entropy
دوره 19 شماره
صفحات -
تاریخ انتشار 2017